intensity representation
Neural Networks Learn Distance Metrics
Neural networks may naturally favor distance-based representations, where smaller activations indicate closer proximity to learned prototypes. This contrasts with intensity-based approaches, which rely on activation magnitudes. To test this hypothesis, we conducted experiments with six MNIST architectural variants constrained to learn either distance or intensity representations. Our results reveal that the underlying representation affects model performance. We develop a novel geometric framework that explains these findings and introduce OffsetL2, a new architecture based on Mahalanobis distance equations, to further validate this framework. This work highlights the importance of considering distance-based learning in neural network design.
Fine-grained Emotional Control of Text-To-Speech: Learning To Rank Inter- And Intra-Class Emotion Intensities
Wang, Shijun, Guðnason, Jón, Borth, Damian
Nevertheless, the nuance of references might be difficult to be captured by these models State-of-the-art Text-To-Speech (TTS) models are capable (e.g. one sad and one depressed reference might produce of producing high-quality speech. The generated speech, the same synthesized speech), due to a mismatch between the however, is usually neutral in emotional expression, whereas content or speaker of the reference and synthesized speech, very often one would want fine-grained emotional control which implies the inflexible controllability of these models. of words or phonemes. Although still challenging, the first A better approach to achieve fine-grained controllable TTS models have been recently proposed that are able to emotional TTS is by manually assigning intensity labels (such control voice by manually assigning emotion intensity. Unfortunately, as strong or weak happiness) on words or phonemes, which due to the neglect of intra-class distance, the provides a flexible and efficient way to control the emotion intensity differences are often unrecognizable.